Statistical data processing in clinical proteomics.

نویسندگان

  • Suzanne Smit
  • Huub C J Hoefsloot
  • Age K Smilde
چکیده

This review discusses data analysis strategies for the discovery of biomarkers in clinical proteomics. Proteomics studies produce large amounts of data, characterized by few samples of which many variables are measured. A wealth of classification methods exists for extracting information from the data. Feature selection plays an important role in reducing the dimensionality of the data prior to classification and in discovering biomarker leads. The question which classification strategy works best is yet unanswered. Validation is a crucial step for biomarker leads towards clinical use. Here we only discuss statistical validation, recognizing that biological and clinical validation is of utmost importance. First, there is the need for validated model selection to develop a generalized classifier that predicts new samples correctly. A cross-validation loop that is wrapped around the model development procedure assesses the performance using unseen data. The significance of the model should be tested; we use permutations of the data for comparison with uninformative data. This procedure also tests the correctness of the performance validation. Preferably, a new set of samples is measured to test the classifier and rule out results specific for a machine, analyst, laboratory or the first set of samples. This is not yet standard practice. We present a modular framework that combines feature selection, classification, biomarker discovery and statistical validation; these data analysis aspects are all discussed in this review. The feature selection, classification and biomarker discovery modules can be incorporated or omitted to the preference of the researcher. The validation modules, however, should not be optional. In each module, the researcher can select from a wide range of methods, since there is not one unique way that leads to the correct model and proper validation. We discuss many possibilities for feature selection, classification and biomarker discovery. For validation we advice a combination of cross-validation and permutation testing, a validation strategy supported in the literature.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Relationships of Sensory Processing Sensitivity and Information Processing Styles to Clinical Symptoms in Drug-Dependent Individuals

Objective: The aim of this study was to examine the roles of sensory processing sensitivity and information processing styles in predicting clinical symptoms in drug-dependent individuals. Method: The current study was correlational-descriptive. The statistical population included all drug-dependent individuals referred to addiction treatment centers in Tabriz. Among them, 290 people were selec...

متن کامل

UvA - DARE ( Digital Academic Repository ) Statistical data processing in clinical proteomics

Disclaimer/Complaints regulations If you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please let the Library know, stating your reasons. In case of a legitimate complaint, the Library will make the material inaccessible and/or remove it from the website. Please Ask the Library: http://uba.uva.nl/en/contact, or a letter to: Library of ...

متن کامل

Emotion Processing in Patients with Early- and Late-Onset Temporal Lobe Epilepsy

Objective: Temporal Lobe Epilepsy (TLE) can contribute to various emotional symptoms by damaging the temporal lobe. This study aimed at investigating emotion processing in patients with early- and late-onset TLE compared to a healthy group. Methods: In this causal-comparative study, 60 patients with diagnosed TLE were compared to 60 healthy controls to identify emotion processing styles. The d...

متن کامل

Data pre-processing in liquid chromatography-mass spectrometry-based proteomics

MOTIVATION In a liquid chromatography-mass spectrometry (LC-MS)-based expressional proteomics, multiple samples from different groups are analyzed in parallel. It is necessary to develop a data mining system to perform peak quantification, peak alignment and data quality assurance. RESULTS We have developed an algorithm for spectrum deconvolution. A two-step alignment algorithm is proposed fo...

متن کامل

Statistical analysis of proteomics data

Protemic mass spectrometry profiling is increasingly becoming an important tool in clinical diagnostics, for example to identify biomarkers for cancer. Similarly as with other high-throughput technologies, sophisticated statistical algorithms are essential in the analysis of spectrometry data. In my talk I will discuss the statistical and algorithmic challenges involved in the analysis of clini...

متن کامل

New statistical algorithms for clinical proteomics

Background: Mass spectrometry based screening methods have been recently introduced into clinical proteomics. This boosts the development of a new approach for early disease detection: proteomic pattern analysis. Aim: Find, analyze and compare proteomic patterns in groups of patients having different properties such as disease status or epidemio-logical parameters (e.g. sex, age) with a new pip...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Journal of chromatography. B, Analytical technologies in the biomedical and life sciences

دوره 866 1-2  شماره 

صفحات  -

تاریخ انتشار 2008